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SUMMARY 


Overview 

Three  experiments  show  that  in  hindsight  people  systematically  exag- 
gerate the  predictability  of  the  results  of  scientific  experiments. 
This  judgmental  bias  has  implications  for  the  management  of  scientific 
research  programs,  the  conduct  of  experimental  research,  and  the  re- 
view of  scientific  manuscripts. 

Background 


Much  scientific  research  is  conducted  in  two  stages,  the  first  of 
which  is  a pretest  intended  to  see  how  viable  the  research  design 
is  and  what  sort  of  results  it  produces.  After  these  initial  re- 
sults have  been  obtained,  a decision  must  be  made  whether  to  con- 
tinue with  the  full  research  project.  Responsibility  for  the  de- 
cision may  lie  either  with  the  scientist  conducting  the  study  or  a 
research  manager.  This  decision  can  be  made  either  before  or  after 
the  pretest,  with  the  corresponding  questions  being  "If  Result  X is 
obtained,  what  should  be  done?"  and  "Given  that  Result  X has  been 
obtained,  what  should  be  done?"  The  answers  to  these  two  questions 
should  be  the  same,  as  whether  or  not  Result  X is  actually  in  hand 
should  not  affect  its  impact. 

Findings 


A series  of  three  studies  involving  463  people  showed  that  this  is 
typically  not  the  case.  Once  the  results  of  a pretest  are  in  hand, 
they  are  viewed  as  much  less  surprising  and  much  more  likely  to  be 
replicated  than  they  seemed  in  foresight.  This  finding  was  obtained 
with  pretest  results  for  a variety  of  different  experiments.  It  ap- 
pears that  once  a result  has  been  reported,  from  even  a sample  of  one, 
people  feel  that  it  more  or  less  had  to  happen  and  that  it  is  very 
likely  to  be  obtained  on  reruns  of  the  same  experiment.  In  an  effort 
to  bring  people's  hindsightful  perceptions  of  the  meaning  of  pretest 
results  more  in  line  with  their  foresight  perceptions,  people  told  the 
result  of  the  experiment  were  required  to  show  how  they  would  have 
explained  the  opposite  pretest  result,  had  it  been  obtained.  This 
manipulation  reduced  the  bias  somewhat,  but  did  not  eliminate  it. 

Implications 

If  we  underestimate  the  surprisingness  of  scientific  results,  we  may 
also  underestimate  how  much  we  have  learned  from  them  and  overestimate 
how  much  we  ourselves  know  without  the  benefit  of  such  research.  The 
practical  Implications  of  this  judgmental  bias  depend  on  the  judge's 
role  in  the  research  world.  It  could  encourage  a critic  of  research 
expenditures  to  say  "What  do  we  need  these  studies  for?  They  are  only 
telling  us  things  we  already  know."  It  could  lead  a research  manager 
to  ask  himself  "Why  did  I decide  to  go  ahead  with  that  project  when 
its  results  were  so  easily  foretold?"  It  could  lead  a scientist  to 
curtail  a research  program  after  receiving  results  from  a pretest. 
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without  realizing  how  statistically  unstable  such  results  are.  It 
could  lead  the  editor  or  reviewers  of  scientific  journals  to  reject 
manuscripts  because  the  results  they  report  seem  inevitable.  Indeed, 
this  paper  begins  with  a selection  of  such  reasons  for  rejection 
culled  from  the  files  of  one'  of  the  most  prestigious  of  psychological 
journals. 

No  proven  way  of  overcoming  this  bias  is  known  at  present,  and  the 
article  urges  extreme  caution  in  assessing  the  surprisingness  of  re- 
search results . 
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ON  THE  PSYCHOLOGY  OF  EXPERIMENTAL  SURPRISES: 


OUTCOME  KNOWLEDGE  AND  THE  JOURNAL  REVIEW  PROCESS 
INTRODUCTION 

Consider  the  following  excerpts  from  critiques  of  manuscripts 
submitted  for  editorial  review. 

"The  present  experiment  does  not  tell  us  much  that  is  new." 

"None  of  the  results  appear  terribly  surprising.  The  author  has 
used  an  elephant  gun  to  kill  a flea." 

f| 

"I  find  the  willingness  of  the  author  to  obtain  this  result  in 
yet  another  context  slightly  depressing." 

"The  reaction  of  the  readers  of  (this  journal)  to  this  paper  would 
be  one  of:  'of  course'." 

"What  is  clear  is  that  (the  authors)  had  in  me  a reader  whose 
prior  was  on  the  order  of  .95  or  more.  By  how  much  could  they  increase 
it?" 

"I  must  apologize  to  you  and  the  manuscript's  author  for  the  delay 
in  responding  to  the  manuscript.  Part  of  my  problem  was  in  deciding 
why  I could  not  reconmend  publication  of  a study  with  which  I found  no 
flaws.  The  paper  is  well  written,  the  studies  were  well  designed  and 
conducted,  and  I do  not  feel  that  reading  the  paper  was  a waste  of  my 
time.  Nevertheless,  I could  not  escape  the  feeling  that  the  paper 
merely  shows  to  be  false  a hypothesis  one  can  hardly  take  seriously 
to  begin  with." 

In  each  of  these  examples,  the  reviewer  recommended  rejecting  an 
article  that  was  technically  competent  because  the  results  appeared  too 
predictable  and  unsurprising.  Certainly  this  is  a legitimate  selection 
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criterion.  Results  that  are  wholely  predictable,  and  thus  fail  to 
increase  our  scientific  knowledge,  hardly  rate  readers'  time  and  val- 
uable journal  space. 


Presumably,  no  one  is  better  qualified  to  make  this  sort  of  judg- 
ment than  the  same  reviewers  who  are  selected  for  their  ability  to 
evaluate  methodological  competence,  clarity  of  presentation,  adequacy 
of  literature  review,  and  so  on.  Yet,  judging  the  predictability  of 
results  requires  a rather  special  kind  of  competence.  The  reviewer 
must  ignore  the  results  he  or  she  has  just  seen  reported  and  ascertain 
how  likely  they  seemed  before  the  experiment  was  performed.  Some  re- 
cent findings  by  Fischhoff  (1975a;  1975b)  and  Fischhoff  and  Beyth 
(1975)  suggest  that  such  judgments  may  be  problematic.  Their  results 
showed  that  reporting  the  outcome  of  a historical  event  increases  the 
perceived  likelihood  of  that  outcome,  and  that  people  underestimate 
the  effect  of  outcome  knowledge  on  their  perceptions.  As  a result, 
people  believe  that  they  would  have  seen  in  foresight  the  relative 
inevitability  of  the  reported  outcome  which,  in  fact,  was  only  apparent 
in  hindsight.  Thus,  they  exaggerate  the  predictability  of  reported 
outcomes . 

It  seems  plausible  that  similar  effects  might  occur  when  reviewing 
the  results  of  scientific  research.  Once  we  hear  experimental  findings, 
we  may  tend  to  feel  as  though  we  "knew  all  along"  that  it  would  come 
out  that  way.  If  this  is  the  case,  then  reviewers  may  systematically 
exaggerate  the  predictability  of  the  findings  they  evaluate  and  as  a 
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result  be  unduly  severe  in  their  criticism. 

The  experiments  reported  below  examine  this  possibility.  Subjects 
in  Experiments  I and  II  read  descriptions  of  a number  of  studies  from 

J 


different  scientific  disciplines,  each  of  which  had  two  possible  out- 
comes. Foresight  subjects  were  told  that  a single  subject  was  about 
to  be  tested  in  each  experiment.  For  both  possible  outcomes,  they 
were  asked  to  indicate  the  probability  that  that  outcome  would  be  ob- 
tained on  a specified  number  of  additional  replications,  if  it  were 
obtained  on  the  first  subject.  Hindsight  subjects  were  told  that  the 
first  subject  had  already  been  tested  and  had  produced  one  of  the  pos- 
sible outcomes.  They  were  asked  how  likely  it  was  that  this  outcome 
would  be  obtained  on  the  same  specified  number  of  additional  replications. 


Thus,  both  groups  were  asked  to  assess  the  probability  of  a num- 
ber of  future  replications,  conditional  on  the  outcome  obtained  from 
a first  subject.  Formally,  these  conditional  probabilities  should 
be  the  same  for  subjects  in  both  groups.  We  hypothesized,  however, 
that  hindsight  subjects,  told  the  outcome  obtained  on  the  first  sub- 
ject, would  exaggerate  its  inevitability  and  thus  the  probability  that 
it  would  be  replicated  on  future  trials.  Foresight  subjects,  we  be- 
lieved, would  be  less  sanguine  about  the  prospects  of  successful  rep- 
lications. One  reason  for  such  an  effect  is  that  foresight  is  a 
perspective  conducive  to  seeing  how  an  experiment  could  go  either 
way,  whereas  in  hindsight,  we  may  be  so  intent  on  explaining  the  re- 
ported result  that  we  can  no  longer  see  how  the  experiment  could,  in 
past  or  future,  have  gone  otherwise. 

For  the  sake  of  concepcual  clarity,  it  may  be  valuable  to  consider 
the  relationship  of  this  experiment  to  Tversky  and  Kahneman’s  (1971, 
1973;  also  Kahneman  & Tversky,  1973)  finding  that  people  are  insensi- 
tive to  the  amount  and  quality  of  the  information  on  which  their  judg- 
ments are  based.  In  particular,  people  are  willing  to  make  very  con- 
fident predictions  on  the  basis  of  very  limited  samples.  These  results 
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predict  that  both  foresight  and  hindsight  subjects  will  have  undue 
confidence  in  the  replicability  of  the  outcome  obtained  from  the  first 
subject  in  each  study.  They  do  not  predict  different  degrees  of  over- 
confidence  for  foresight  and  hindsight  subjects. 

EXPERIMENT  I 
Method 

Design 

Subjects  received  brief  descriptions  of  experiments  drawn  from 
biology,  psychology  and  meteorology  which  they  were  told  either  would 
soon  be  conducted  (foresight)  or  had  recently  been  conducted  (hind- 
sight) . Foresight  subjects  were  told  that  two  outcomes  were  possible 
with  the  first  subject  in  each  experiment,  while  hindsight  subjects 
were  told  that  one  of  those  two  possible  outcomes  had  been  obtained. 
Foresight  subjects  were  asked  to:  (a)  assign  a probability  to  each 

of  the  possible  first-subject  outcomes;  (b)  explain  why  each  outcome 
might  occur;  and  ( c ) estimate  the  probability  that  each  of  the  two 
possible  outcomes  would  be  replicated  in  all,  some,  or  none  of  a fixed 
number  of  replications  if  it  were  obtained  with  the  initial  subject. 
Hindsight  subjects  were  asked  to:  (a)  explain  why  the  reported  out- 

come had  occurred;  and  (b)  estimate  the  probability  that  it  would  be 
obtained  in  all,  some,  or  none  of  th£  replications.  The  dependent 
variable  for  all  groups  was  the  conditional  probability  of  replicating 
the  outcome  of  the  initial  experiment. 

Stimuli 

The  descriptions  of  the  four  experiments  along  with  the  possible 
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outcomes  considered  and  the  number  of  replications  were  as  follows:^" 

Virgin  rat:  Several  researchers  intend  to  perform  the  follow- 

ing experiment.  They  will  inject  blood  from  a mother  rat  into  a virgin 
rat  immediately  after  the  mother  rat  has  given  birth.  After  the  in- 
jection, the  virgin  rat  will  be  placed  into  a cage  with  the  newly  born 
baby  rats,  after  removal  of  the  actual  mother. 

Outcomes  used:  (a)  The  virgin  rat  exhibited  maternal  behavior; 

(b)  the  virgin  rat  failed  to  exhibit  maternal  behavior.  Subjects  es- 
timated the  probability  of  the  initial  result  being  replicated  with 
all,  some,  or  none  of  10  additional  virgin  rats. 

Hurricane  seeding:  A team  of  government  meteorologists  re- 

cently seeded  a tropical  storm,  which  had  reached  hurricane  status, 
with  large  quantities  of  silver-iodide  crystals  (the  same  type  of 
crystals  are  used  to  seed  clouds  in  attempts  to  produce  rain). 

Outcomes  used:  (a)  The  hurricane  increased  in  intensity;  (b) 

the  hurricane  decreased  in  intensity.  Subjects  estimated  the  proba- 
bility of  the  initial  result  being  replicated  in  all,  some,  or  none 
of  six  additional  hurricanes. 

Gosling  imprinting:  A goose  egg  was  placed  in  a sound-proof , 

heated  box  from  time  of  laying  to  time  of  cracking.  Approximately 
two  days  before  it  cracked,  the  experimenter  began  to  intermittently 
play  sounds  of  ducks  quacking  into  the  box.  On  the  day  after  birth, 

^ For  stylistic  purposes,  the  tenses  of  the  verbs  used  in  these  de- 
scriptions varied  between  experiments  and  between  hindsight  and  fore- 
sight versions  of  the  same  experiment.  Fischhoff  (1976)  has  found 
that  the  tense  used  in  describing  events  has  no  effect  on  their  per- 
ceived likelihood. 
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the  gosling  was  placed  on  a smooth  floor  equidistant  from  a duck  and 
a goose,  each  of  which  was  in  a wire  cage.  The  gosling  was  observed 
for  two  minutes. 

Outcomes  used:  (a)  The  gosling  approached  the  caged  duck;  (b) 

the  gosling  approached  the  caged  goose.  Subjects  estimated  the  proba- 
bility of  the  initial  result  being  replicated  with  all,  some,  or  none 
of  10  additional  goslings. 

Y-test : In  the  pretest  of  an  experiment  that  she  intends 

to  run  in  the  future,  an  experimenter  placed  a four-year-old  child 
in  front  of  an  easel  with  a large  Y on  It,  with  a dot  in  the  lower 
left-hand  third.  The  child  was  then  taken  around  to  the  back  of  the 
easel  where  he  saw  another  Y.  He  was  asked  to  draw  a dot  in  the 
"same  position"  on  that  Y as  the  one  he  had  just  seen. 

Outcomes  used:  (a)  The  child  placed  a dot  in  Area  A [the  lower 

left-hand  third];  (b)  the  child  placed  a dot  in  Area  B [the  lower 
right-hand  third].  Subjects  estimated  the  probability  that  the  ini- 
tial result  would  be  replicated  with  one  additional  child. 

The  hurricane  seeding  experiment  was  loosely  based  on  Howard, 
Matheson  and  North  (1972);  the  imprinting  study  on  Grier,  Counter  and 
Shearer  (1967);  the  Y-test  on  Smothergill,  Hughes,  Timmons  and  Hutko 
(1975).  The  virgin  rat  study  was  invented. 

The  virgin  rat  experiment  was  presented  to  one  set  of  foresight 
and  hindsight  groups.  The  other  three  experiments  were  presented  to- 
gether to  a second  set  of  foresight  and  hindsight  groups. 


Instructions 

All  subjects  received  the  same  general  instruction:  "The  follow- 

ing questionnaire  concerns  your  scientific  intuitions.  We'd  like  to 
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ask  you  a number  of  questions  about  possible  results  of  several  ex- 
periments in  different  areas  which  have  recently  been  conducted,  or 
will  be  in  the  near  future.  We  thank  you  for  your  cooperation." 

Each  experiment  appeared  on  a separate  page,  with  the  description 
at  the  top.  Questions  were  presented  in  the  following  format  (using 
the  virgin  rat  example) : 

Foresight:  l.a.  What  is  the  probability  that  the  virgin  rat 


will  exhibit  maternal  behavior? 
this  might  happen? 


Why  do  you  think  that 


b.  What  is  the  probability  that  the  virgin  rat  will  not  exhibit 


maternal  behavior? 


happen? 


Why  do  you  think  that  this  might 


2.  If  the  virgin  rat  does  exhibit  maternal  behavior,  what  is 
the  probability  that  in  a replication  of  this  experiment  with  ten  ad- 
ditional virgin  female  rats: 

a.  All  will  exhibit  maternal  behavior? 


b.  Some  will  exhibit  maternal  behavior?  

c.  None  will  exhibit  maternal  behavior?  

(Note:  These  three  probabilities  should  total  100%.) 

3.  Identical  to  Question  2 except  that  it  begins  "If  the  virgin 
rat  does  not  exhibit  maternal  behavior  ..." 

Hindsight  (after  being  told  either  that  the  initial  virgin  rat 
exhibited  maternal  behavior  or  that  it  failed  to  exhibit  maternal  be- 


havior) : 


1.  Why  do  you  think  that  this  happened? 

2.  What  is  the  probability  that  in  a replication  of  this  experiment 


with  ten  additional  virgin  female  rats: 
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a.  All  will  exhibit  maternal  behavior?  

b.  Some  will  exhibit  maternal  behavior?  

c.  None  will  exhibit  maternal  behavior?  

(Note:  These  three  probabilities  should  total  100%.) 


Subjects 

All  184  subjects  were  paid  volunteers  who  responded  to  an  ad  in 
the  University  of  Oregon  student  newspaper.  The  present  task  was 
the  first  of  several  performed  during  a two-hour  session.  Group  size 
varied  from  24  to  37. 


Results 

The  first  and  third  columns  of  Table  1 present  the  mean  proba- 
bility of  replication  assigned  by  the  foresight  and  hindsight  groups 
in  Experiment  I.  The  italicized  rows  of  Table  1 (rows  1,  6,  7,  10, 

13,  16,  19,  and  23)  present  the  mean  judged  probability  of  the  initial 
outcome's  being  obtained  on  all  subsequent  replications.  In  six  of 
eight  cases  (two  from  each  of  four  experiments),  this  probability  was 
significantly  larger  for  hindsight  than  foresight  subjects.  Thus, 
subjects  told  that  an  experiment  had  "worked"  once  in  the  past  found 
its  working  consistently  in  the  future  more  likely  than  those  asked 
"if  it  works  once,  how  likely  is  it  to  work  again  consistently?"  For 
the  three  experiments  with  multiple  replications  (virgin  rat,  hurri- 
cane seeding,  gosling  imprinting),  the  mean  probability  of  an  initial 
outcome's  always  being  replicated  was  .383  for  the  foresight  group 
and  .546  for  the  hindsight  group;  the  mean  probabilities  of  its  never 
being  replicated  were  .187  and  .095,  respectively. 
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Discussion 


Why  should  these  formally  equivalent  conditional  probabilities 
be  judged  differently  by  hindsight  and  foresight  subjects?  Two  pos- 
sibilities occur  to  us.  One,  suggested  in  the  introduction,  is  that 
hindsight  subjects  unduly  concentrate  their  attention  on  the  reported 
outcome,  thereby  failing  to  see  how  the  initial  experiment,  could  have 
gone  the  other  way.  A second  possibility  is  that  the  conditional 
judgments  that  the  foresight  subjects  make  ("if  it  were  to  work  once, 
what  is  . . . ?")  are  quite  difficult  and  confusing.  Thus,  when  they 
attempt  to  consider  the  possible  occurrence  of  two  different  outcomes, 
they  may  be  unable  to  devote  to  either  the  attention  given  by  hind- 
sight to  their  one  alternative.  As  a result,  foresight  subjects  may 
be  unable  to  properly  assess  the  impact  which  the  result  from  the 
first  subject  should  have  on  the  perceptions.  In  summary,  the  "avail- 
ability of  reasons"  explanation  attributes  the  discrepancy  to  hind- 
sight subjects'  failure  to  consider  the  feasibility  of  alternative 
outcomes.  The  "conditionality"  explanation  attributes  the  effect  to 
the  inability  of  foresight  subjects  to  consider  multiple  contingencies. 
Both  may  be  true. 

Experiment  II  tests  these  hypotheses  by  replicating  Experiment  I 
with  the  following  differences:  (a)  Foresight  subjects  were  required 

to  consider  the  probability  of  replicating  only  one  of  the  possible 
outcomes;  (b)  hindsight  subjects  were  required  to  explain  not  only 
why  the  reported  outcome  happened,  but  also  "Had  the  experiment  worked 
out  the  other  way,  how  would  you  explain  it?"  These  one-alternative 
foresight  subjects  should  be  able  to  devote  the  same  undivided  atten- 
tion to  their  one  possible  outcome  that  the  one-alternative  hindsight 


subjects  in  Experiment  I could  devote  to  their  one  reported  outcome. 
If  the  conditionality  hypothesis  is  correct,  they  should  respond  more 
like  one-alternative  hindsight  subjects  than  the  two-alternative 
foresight  subjects  in  Experiment  I.  According  to  the  "availability- 
of-reasons"  hypothesis,  hindsight  subjects  forced  to  consider  why  the 
unreported  outcome  might  have  occurred  should  respond  like  foresight 
subjects . 


EXPERIMENT  II 


Method 

Experiment  II  was  identical  to  Experiment  I except  for  two  changes. 
The  first  was  that  foresight  subjects  estimated  the  probability  of 
replicating  only  one  of  the  two  possible  outcomes  for  each  experi- 
ment. They  were  asked  either  "If  the  experiment  works,  how  likely 
is  that  result  to  be  replicated?"  or  "If  the  experiment  doesn't  work, 
how  likely  is  that  result  to  be  replicated?"  The  two-alternative 
foresight  group  in  Experiment  I answered  both  these  questions.  Second, 
the  two-alternative  hindsight  group  of  Experiment  II  was  asked  not 
only  "Why  did  the  experiment  work  out  this  way?"  but  also  "Had  the 
experiment  worked  out  the  other  way,  how  would  you  explain  it?"  Like 
the  one-alternative  hindsight  subjects  of  Experiment  I,  they  estimated 
the  probability  of  replication  only  for  the  reported  outcome.  One 
hundred  and  fifty-one  subjects  were  recruited  in  the  same  manner  as 
in  Experiment  I. 


Results 


Columns  2 and  4 of  Table  1 present  the  mean  probabilities  from 


Experiment  II.  Comparing  columns  1 and  2,  we  see  that  the  responses 
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of  one-  and  two-alternative  foresight  subjects  were  generally  indis- 
tinguishable. Reducing  the  number  of  alternatives  considered  did 
not  systematically  increase  the  perceived  probability  of  replicating 
the  initial  outcome.  In  only  6 of  24  cases  was  the  one-alternative 
foresight  mean  closer  to  the  one-alternative  hindsight  mean  than  to 
the  two-alternative  foresight  mean.  Thus,  there  is  no  evidence  that 
attentional  problems  are  responsible  for  the  hindsight-foresight  dis- 
crepancy. 

The  second  manipulation,  forcing  two-alternative  hindsight  sub- 
jects to  consider  how  the  first  trial  of  the  experiment  could  have 
turned  out  otherwise,  produced  a marked  difference.  A comparison  of 
column  4 with  columns  1 or  2 reveals  a substantial  hindsight  effect 
for  5 of  the  8 outcomes  considered  (all  but  lb,  2a  and  3b).  The  size 
of  the  effect,  however,  was  reduced.  For  four  of  the  eight  outcomes 
(la,  2a,  3a,  4a),  the  mean  probability  of  consistently  replicating 
the  reported  outcome  was  significantly  lower  for  two-alternative  than 
for  one-alternative  hindsight  subjects.  In  general,  the  means  of  the 
two-alternative  hindsight  lie  between  those  for  the  one-alternative 
hindsight  and  both  foresight  groups.  Although  not  inconsistent  with 
the  conditionality  hypothesis,  these  results  strongly  support  the 
"availability-of-reasons"  hypothesis. 

Further  evidence  of  the  effect  of  reason  availability  on  proba- 
bility judgments  was  sought  by  looking  at  those  two-alternative  hind- 
sight subjects  unable  to  supply  reasons  for  one  of  the  two  alternative 
outcomes.  Subjects  who  could  not  think  of  one  reason  why  the  unre- 
ported outcome  might  have  happened  found  replication  of  the  reported 
outcome  slightly  more  likely  than  did  other  subjects  (mean  difference 
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* .021);  subjects  who  could  not  think  of  one  reason  for  the  reported 
outcome  found  replication  much  less  likely  than  did  other  subjects 
(mean  difference  = .211). 

Discussion  of  Experiments  I and  II 

Although  these  results  seem  to  support  the  "availabillty-of- 
reasons"  account  of  the  hindsight-foresight  discrepancies,  the  evi- 
dence is  inconclusive.  The  possibility  remains  that  conditional  tasks, 
however  structured,  cause  difficulties.  There  appear  to  be  few,  if 
any,  empirical  studies  germane  to  this  problem.  Aside  from  its  theo- 
retical interest,  the  question  of  conditional  judgments  has  significant 
applied  implications.  If  we  are  to  engage  effectively  in  contingency 
planning,  we  must  be  able  to  assess,  in  advance,  the  impact  which  re- 
ceipt of  various  possible  data  may  have  on  our  perceptions.  If  these 
conditional  judgments  are  inaccurate,  the  plans  based  on  them  may 
appear  grossly  inappropriate  when  dimly  foreseen  contingencies  do 
arise  (Brown,  1976). 

If  the  hindsight  effect  found  in  Experiments  I and  II  afflicts 
researchers,  it  may  constitute  an  important  impediment  to  scientific 
progress.  When  planning  an  experiment,  investigators,  like  our  fore- 
sight subjects,  may  be  able  to  see  that  various  results  are  possible 
and  that  they  should  not  put  undue  confidence  in  results  from  a few 
initial  subjects.  However,  once  a (any?)  result  has  been  obtained  on 
pilot  trials,  they  may  throw  caution  to  the  wind  and  view  that  result 
as  highly  likely  and  easily  replicable.  As  a result  they  may  reduce 
the  size  and  power  of  the  ensuing  sample — a step  whose  consequences 
have  been  noted  by  Cohen  (1969)  and  by  Tversky  and  Kahneman  (1971). 


As  reviewers,  this  bias  may  lead  us  to  denigrate  worthy  contri- 
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butions,  believing,  like  those  reviewers  cited  at  the  beginning  of 
this  paper,  that  they  are  trivial,  obvious  and  foreseeable.  Before 
speculating  further  on  the  implication  of  this  hindsight  bias  on  the 
journal  review  process,  let  us  consider  some  evidence  acquired  in  a 
setting  more  closely  resembling  that  process.  Experiment  III  repli- 
cates Experiments  I and  II  in  a journal  review  format. 

EXPERIMENT  III 
Method 

Design 

Subjects  were  asked  to  read  and  evaluate  scientific  manuscripts 
in  a manner  similar  to  that  of  professional  reviewers . Hindsight  re- 
viewers received  manuscripts  with  introduction,  method,  and  results 
sections.  For  foresight  subjects,  the  results  section  was  missing. 

Each  manuscript  was  composed  so  that  there  were  two  possible  outcomes 
for  the  study  in  question.  There  were  two  separate  hindsight  groups, 
each  receiving  one  of  the  possible  outcomes  presented  as  if  it  had 
actually  happened. 

Subjects  were  asked  to  evaluate  the  manuscripts  on  seven  7-point 
scales,  two  of  which  were  designed  to  be  sensitive  to  hindsight-foresight 
differences.  One  was  surprisingness  of  results:  hindsight  subjects 

assessed  the  surprisingness  of  the  reported  outcome;  foresight  sub- 
jects assessed  how  surprising  each  of  the  two  possible  outcomes  would 
seem  were  they  obtained.  The  second  sensitive  question  was  stability 
of  results:  hindsight  subjects  assessed  the  likelihood  that  the  re- 

ported results  would  be  obtained  in  an  exact  replication  of  the  same 
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experiment;  foresight  subjects  answered  the  same  question  for  each  of 
the  possible  results.  The  remaining  five  scales  were  used  as  fillers 
and  to  test  for  other  possible  changes  between  hindsight  and  foresight. 
They  referred  to:  clarity  of  the  introduction,  clarity  of  the  research 

question,  clarity  of  the  method,  adequacy  of  the  method  to  test  the 
research  question,  and  personal  interest  in  the  study. 

Stimuli 

2 

Three  experiments  from  diverse  areas  of  psychology  were  used. 

One,  called  "Scientific  ambiguity  and  attitudinal  conflict,"  described 
an  unpublished  experiment  which  we  had  recently  completed.  In  that 
experiment,  subjects  first  indicated  their  position  on  several  envi- 
ronmental issues,  including  nuclear  power;  some  time  later,  they  were 
asked  to  guess  whether  an  ambiguous  statement  about  nuclear  power  was 
offered  by  an  opponent  or  proponent  of  nuclear  power.  We  had  hypothe- 
sized that  people  would  interpret  ambiguous  statements  as  supporting 
their  own  positions — but  we  were  wrong. 

The  second  and  third  studies  were  "elaborated"  versions  of  the 
gosling  imprinting  and  Y-test  studies  used  in  Experiments  I and  II. 

No  hypothesis  was  advanced  for  either  of  these  studies. 

These  studies  were  chosen  to  be  unfamiliar,  yet  comprehensible, 
without  prior  knowledge  of  the  area.  They  were  written  to  show  that 
there  were  two  possible  outcomes,  each  of  which  could  conceivably 
be  obtained. 


Copies  of  these  descriptions  and  the  accompanying  questionnaires 
are  available  upon  request. 


15 


Procedure 


Subjects  were  told  about  the  review  process  for  scientific  manu- 
scripts and  then  asked  to  perform  a task  similar  to  that  of  actual 
reviewers.  They  read  the  three  studies  in  the  order  given  above, 
evaluating  each  before  going  on  to  the  next. 


Subjects 

One  hundred  twenty-eight  paid  subjects  participated,  responding 
to  an  advertisement  in  the  University  of  Oregon  student  newspaper. 
They  were  assigned  to  the  foresight  group  or  one  of  the  two  hindsight 
groups  according  to  their  preference  for  experimental  date  and  hour. 


Results 

If  these  reviewers  are  susceptible  to  a hindsight  bias,  the  hind- 
sight subjects  should  find  the  reported  results  less  surprising  and 
more  likely  to  be  replicated  than  the  foresight  subjects  anticipated 
they  would  appear.  Table  2 presents  the  relevant  group  means  for  the 
two  outcomes  used  for  each  of  the  three  experiments.  In  five  of  the 
six  cases,  hindsight  subjects  found  the  reported  outcome  less  surprising 
and  more  replicable  than  did  foresight  subjects;  in  three  of  six  cases, 
this  difference  was  statistically  significant.  There  were  no  system- 
atic differences  on  the  five  filler  questions. 


GENERAL  DISCUSSION 

Experiment  III  did  not  allay  our  concern.  As  reviewers,  hind- 
sight bias  may  lead  us  to  denigrate  worthy  contributions,  believing, 
like  those  reviewers  cited  at  the  beginning  of  this  paper,  that  they 
are  trivial,  obvious,  and  foreseeable.  An  extreme  measure  to  counter 
this  tendency  would  be  to  institute  a system  of  deaf  review,  forcing 
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Mean  Ratings  for  Experiment  III 


the  reviewer  to  make  some  evaluations  without  hearing  the  results. 
Under  this  system,  the  reviewer  receives  the  introduction  and  method 
sections  of  an  article  without  the  results.  After  reading  these 
sections,  he  or  she  makes  a written  prediction  of  the  outcome  of  the 
experiment.  Once  this  prediction  has  been  returned  to  the  editor, 
the  remainder  of  the  manuscript  is  sent.  The  written  record  can  alert 
both  editor  and  reviewer  to  what  results  could  be  anticipated.  For 
such  a scheme  to  be  successful,  authors  would  have  to  give  no  hint 
of  their  results  in  the  introduction  and  method  sections. 


Given  the  demands  currently  placed  on  journal  editors,  reviewers, 
and  authors,  this  cumbersome  proposal  has  little  chance  of  being  ac- 


cepted. For  many,  possibly  most,  manuscripts,  such  a procedure  would 
also  be  unnecessary.  For  these  manuscripts  are  rejected  on  technical 
grounds  independent  of  their  informativeness.  Only  if  a submission 
.is  methodologically  competent,  tolerably  written,  and  sent  to  the  proper 
journal,  does  its  fate  depend  on  its  scientific  substance.  Thus, 
hindsight  bias,  when  present,  will  primarily  affect  a subset  of  the 
best  submitted  manuscripts. 

How  do  we  protect  these  manuscripts  from  unfair  rejection?  The 
reduced  hindsight  effect  with  the  two-alternative  hindsight  group  in 
Experiment  II  suggests  one  solution:  Have  reviewers  provide  reasons 

pointing  to  the  result  that  was  not  obtained.  Such  a proposal  seems 
implicit  in  the  editorial  policy  recently  set  forth  by  one  APA  journal: 
The  author  and  reader  of  a research  report  should  both  feel 
it  possible  to  make  a convincing  case  that  the  results  of 
the  reported  research  could  have  been  interesting  if  they 
had  come  out  differently  from  those  reported.  Publication 


; 


in  JPSP  is  inappropriate  when  it  is  not  possible  to  imagine 
any  reasonable  basis  for  finding  results  other  than  the  ones 
reported.  (Greenwalt,  1976,  p.  4) 

There  is,  however,  no  guarantee  that  this  simple  procedure  will  do  the 
job.  In  Experiment  II  the  debiasing  was  only  partial,  and  substantial 
hindsight  effects  were  still  obtained  with  five  of  the  eight  outcomes. 
More  research  is  clearly  needed.  Until  the  extent  of  this  bias  is 
known  and  techniques  for  eliminating  it  are  developed,  we  might  do 
well  uo  reject  or  at  least  reduce  the  importance  of  inf ormativeness- 
surprisingness  as  a criterion  in  the  manuscript  review  process. 
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